In the last decade, exponential data growth supplied machine learning-based algorithms' capacity and enabled their usage in daily-life activities. Additionally, such an improvement is partially explained due to the advent of deep learning techniques, i.e., stacks of simple architectures that end up in more complex models. Although both factors produce outstanding results, they also pose drawbacks regarding the learning process as training complex models over large datasets are expensive and time-consuming. Such a problem is even more evident when dealing with video analysis. Some works have considered transfer learning or domain adaptation, i.e., approaches that map the knowledge from one domain to another, to ease the training burden, yet most of them operate over individual or small blocks of frames. This paper proposes a novel approach to map the knowledge from action recognition to event recognition using an energy-based model, denoted as Spectral Deep Belief Network. Such a model can process all frames simultaneously, carrying spatial and temporal information through the learning process. The experimental results conducted over two public video dataset, the HMDB-51 and the UCF-101, depict the effectiveness of the proposed model and its reduced computational burden when compared to traditional energy-based models, such as Restricted Boltzmann Machines and Deep Belief Networks.
translated by 谷歌翻译
深度学习体系结构已在不同领域(例如医学,农业和安全)取得了有希望的结果。但是,由于培训过程中所需的大型收藏品,在许多实际应用中使用这些强大的技术变得具有挑战性。几项作品通过提出可以更少学习更多知识的策略,例如弱和半监督的学习方法来克服它来克服它。由于这些方法通常无法解决对对抗性例子的记忆和敏感性,因此本文介绍了三种深度度量学习方法与混音相结合,以实现不完整的监督场景。我们表明,在这种情况下,指标学习中的一些最新方法可能无法很好地工作。此外,所提出的方法在不同数据集中的表现优于大多数。
translated by 谷歌翻译
深度学习(DL)是各种计算机视觉任务中使用的主要方法,因为它在许多任务上取得了相关结果。但是,在具有部分或没有标记数据的实际情况下,DL方法也容易出现众所周知的域移位问题。多源无监督的域适应性(MSDA)旨在通过从一袋源模型中分配弱知识来学习未标记域的预测指标。但是,大多数作品进行域适应性仅利用提取的特征并从损失函数设计的角度降低其域的转移。在本文中,我们认为仅基于域级特征处理域移动不足,但是在功能空间上对此类信息进行对齐也是必不可少的。与以前的工作不同,我们专注于网络设计,并建议将多源版本的域对齐层(MS-DIAL)嵌入预测变量的不同级别。这些层旨在匹配不同域之间的特征分布,并且可以轻松地应用于各种MSDA方法。为了显示我们方法的鲁棒性,我们考虑了两个具有挑战性的情况:数字识别和对象分类,进行了广泛的实验评估。实验结果表明,我们的方法可以改善最新的MSDA方法,从而在其分类精度上获得 +30.64%的相对增长。
translated by 谷歌翻译
大多数可用的图像数据通常以压缩格式存储,JPEG从中最广泛地存储。为了在卷积神经网络(CNN)上提供这些数据,需要进行初步解码过程才能获得RGB像素,要求高计算负载和内存使用。因此,近年来,用于处理JPEG压缩数据的CNN的设计引起了人们的关注。在大多数现有作品中,典型的CNN体​​系结构都可以通过DCT系数而不是RGB像素来促进学习。尽管它们是有效的,但其建筑变化要么提高了计算成本,要么从DCT输入中忽略了相关信息。在本文中,我们研究了为DCT输入而设计的CNN的不同方法,从而利用学习策略来通过充分利用DCT输入来降低计算复杂性。我们的实验是在Imagenet数据集上进行的。结果表明,学习如何以数据驱动的方式组合所有DCT输入比手工丢弃它们更好,并且它与减少层的结合已被证明可以有效地降低计算成本,同时保持准确性。
translated by 谷歌翻译
Due to the environmental impacts caused by the construction industry, repurposing existing buildings and making them more energy-efficient has become a high-priority issue. However, a legitimate concern of land developers is associated with the buildings' state of conservation. For that reason, infrared thermography has been used as a powerful tool to characterize these buildings' state of conservation by detecting pathologies, such as cracks and humidity. Thermal cameras detect the radiation emitted by any material and translate it into temperature-color-coded images. Abnormal temperature changes may indicate the presence of pathologies, however, reading thermal images might not be quite simple. This research project aims to combine infrared thermography and machine learning (ML) to help stakeholders determine the viability of reusing existing buildings by identifying their pathologies and defects more efficiently and accurately. In this particular phase of this research project, we've used an image classification machine learning model of Convolutional Neural Networks (DCNN) to differentiate three levels of cracks in one particular building. The model's accuracy was compared between the MSX and thermal images acquired from two distinct thermal cameras and fused images (formed through multisource information) to test the influence of the input data and network on the detection results.
translated by 谷歌翻译
In recent years, image and video delivery systems have begun integrating deep learning super-resolution (SR) approaches, leveraging their unprecedented visual enhancement capabilities while reducing reliance on networking conditions. Nevertheless, deploying these solutions on mobile devices still remains an active challenge as SR models are excessively demanding with respect to workload and memory footprint. Despite recent progress on on-device SR frameworks, existing systems either penalize visual quality, lead to excessive energy consumption or make inefficient use of the available resources. This work presents NAWQ-SR, a novel framework for the efficient on-device execution of SR models. Through a novel hybrid-precision quantization technique and a runtime neural image codec, NAWQ-SR exploits the multi-precision capabilities of modern mobile NPUs in order to minimize latency, while meeting user-specified quality constraints. Moreover, NAWQ-SR selectively adapts the arithmetic precision at run time to equip the SR DNN's layers with wider representational power, improving visual quality beyond what was previously possible on NPUs. Altogether, NAWQ-SR achieves an average speedup of 7.9x, 3x and 1.91x over the state-of-the-art on-device SR systems that use heterogeneous processors (MobiSR), CPU (SplitSR) and NPU (XLSR), respectively. Furthermore, NAWQ-SR delivers an average of 3.2x speedup and 0.39 dB higher PSNR over status-quo INT8 NPU designs, but most importantly mitigates the negative effects of quantization on visual quality, setting a new state-of-the-art in the attainable quality of NPU-based SR.
translated by 谷歌翻译
健壮的学习是科学机器学习(SCIML)的重要问题。文献中有几篇关于该主题的作品。但是,对方法的需求不断增加,可以同时考虑SCIML模型识别中涉及的所有不同不确定性组成部分。因此,这项工作提出了一种对SCIML的不确定性评估的综合方法,该方法还考虑了识别过程中涉及的几种不确定性来源。提出的方法中考虑的不确定性是缺乏理论和因果模型,对数据腐败或不完美的敏感性以及计算工作。因此,可以为SCIML领域中的不确定性感知模型提供总体策略。该方法通过案例研究验证,开发了用于聚合反应器的软传感器。结果表明,已识别的软传感器对于不确定性是可靠的,并以所提出的方法的一致性证实。
translated by 谷歌翻译
社会机器人的快速发展刺激了人类运动建模,解释和预测,主动碰撞,人类机器人相互作用和共享空间中共同损害的积极研究。现代方法的目标需要高质量的数据集进行培训和评估。但是,大多数可用数据集都遭受了不准确的跟踪数据或跟踪人员的不自然的脚本行为。本文试图通过在语义丰富的环境中提供运动捕获,眼睛凝视跟踪器和板载机器人传感器的高质量跟踪信息来填补这一空白。为了诱导记录参与者的自然行为,我们利用了松散的脚本化任务分配,这使参与者以自然而有目的的方式导航到动态的实验室环境。本文介绍的运动数据集设置了高质量的标准,因为使用语义信息可以增强现实和准确的数据,从而使新算法的开发不仅依赖于跟踪信息,而且还依赖于移动代理的上下文提示,还依赖于跟踪信息。静态和动态环境。
translated by 谷歌翻译
当使用基于视觉的方法对被占用和空的空地之间的单个停车位进行分类时,人类专家通常需要注释位置,并标记包含目标停车场中收集的图像的训练集,以微调系统。我们建议研究三种注释类型(多边形,边界框和固定尺寸的正方形),提供停车位的不同数据表示。理由是阐明手工艺注释精度和模型性能之间的最佳权衡。我们还调查了在目标停车场微调预训练型号所需的带注释的停车位数。使用PKLOT数据集使用的实验表明,使用低精度注释(例如固定尺寸的正方形),可以将模型用少于1,000个标记的样品微调到目标停车场。
translated by 谷歌翻译
TensorFlow GNN(TF-GNN)是张量曲线的图形神经网络的可扩展库。它是从自下而上设计的,以支持当今信息生态系统中发生的丰富的异质图数据。Google的许多生产模型都使用TF-GNN,最近已作为开源项目发布。在本文中,我们描述了TF-GNN数据模型,其KERAS建模API以及相关功能,例如图形采样,分布式训练和加速器支持。
translated by 谷歌翻译